Skip to content

Conversation

@zhexuany
Copy link
Contributor

@zhexuany zhexuany commented Feb 8, 2026

This commit fixes issues with ROS1 bag string decoding and schema handling:

Changes

  1. ROS1 string decoding (src/encoding/cdr/decoder.rs):

    • CDR decoder now correctly handles ROS1 strings where length is the exact byte count (no null terminator)
    • Unlike ROS2 CDR strings where length includes null terminator
    • Added is_ros1() check in read_string() method
  2. Builtin type variant conflict (src/schema/parser/unified.rs):

    • populate_builtin_types now skips adding "/msg/" variant when the non-prefixed variant was already parsed from message_definition
    • Prevents ambiguous short-name resolution in get_type_variants
    • Example: skips std_msgs/msg/Header when std_msgs/Header was parsed from ROS1 message_definition (which includes the seq field)
  3. Test coverage (tests/ros1_decode_dynamic_tests.rs):

    • Added test_decode_dynamic_tf_topic to verify /tf (tf2_msgs/TFMessage) decoding
    • Exercises nested Header resolution and ROS1 string reading
    • Uses fixture robocodec_test_24_leju_claw.bag

Impact

Fixes decoding of /tf messages from ROS1 bags containing std_msgs/Header with the seq field (ROS1 variant).

…ling

This commit fixes several issues with ROS1 bag decoding and schema handling:

1. **ROS1 string decoding**: CDR decoder now correctly handles ROS1 strings
   where length is the exact byte count (no null terminator), unlike ROS2
   CDR strings where length includes null terminator.

2. **MCAP writer schema encoding**: Fixed add_channel_with_schema to use
   correct schema encoding names ("ros2msg", "jsonschema", "protobuf")
   instead of message encoding, and pass message encoding to add_channel.

3. **Builtin type variant conflict**: populate_builtin_types now skips
   adding "/msg/" variant when the non-prefixed variant was already parsed
   from message_definition (e.g., std_msgs/Header from ROS1 with seq field).

4. **Test coverage**: Added test_decode_dynamic_tf_topic to verify /tf
   (tf2_msgs/TFMessage) decoding with nested Header and ROS1 strings.

Fixes decoding of /tf messages from ROS1 bags containing std_msgs/Header
with the seq field (ROS1 variant).
@codecov
Copy link

codecov bot commented Feb 8, 2026

Codecov Report

❌ Patch coverage is 95.65217% with 1 line in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/schema/parser/unified.rs 92.30% 1 Missing ⚠️

📢 Thoughts on this report? Let us know!

@greptile-apps
Copy link

greptile-apps bot commented Feb 8, 2026

Greptile Overview

Greptile Summary

This PR fixes ROS1 decoding edge cases by teaching the CDR decoder to treat ROS1 strings as length-prefixed without a null terminator, adds a regression test for decoding /tf (tf2_msgs/TFMessage) from a ROS1 bag fixture, and tweaks schema population to avoid builtin /msg/ variants shadowing parsed ROS1 definitions.

It also adjusts the MCAP writer’s channel/schema metadata: schema names now use message_type directly, schema encodings are set based on the payload type, and channels now store the message encoding (e.g., cdr, json, protobuf) instead of the schema encoding. These writer metadata changes feed into CodecFactory::detect_encoding and schema parsing via parse_schema_with_encoding.

Confidence Score: 3/5

  • This PR is close to mergeable but has a couple of correctness issues in encoding metadata and error handling that should be fixed first.
  • Core ROS1 string fix and schema-variant conflict handling look correct and are covered by integration tests, but (1) the MCAP writer now guesses schema encoding with a heuristic that can mislabel ROS1 vs ROS2 schemas and break downstream decoding, and (2) the CDR string reader drops errors when skipping the ROS2 null terminator, which can hide truncated inputs.
  • src/io/formats/mcap/writer.rs, src/encoding/cdr/decoder.rs

Important Files Changed

Filename Overview
src/encoding/cdr/decoder.rs Adds ROS1 string handling in read_string; but ROS2 branch still ignores errors when skipping null terminator (cursor.read_u8() result dropped), which can hide truncation.
src/io/formats/mcap/writer.rs Changes MCAP schema/channel encoding wiring; however schema encoding is now guessed with a heuristic that can mislabel ROS1 vs ROS2 message schemas (e.g., slash-delimited types become ros2msg).
src/lib.rs Exposes io::metadata::FileFormat under a #[cfg(test)] module for tests; no functional impact on non-test builds.
src/schema/parser/unified.rs Adjusts builtin type population to avoid adding /msg/ variants when a parsed variant already exists (prevents ambiguous resolution); change appears safe and localized.
tests/ros1_decode_dynamic_tests.rs Adds a targeted /tf decode_dynamic regression test using an existing ROS1 bag fixture; test logic is consistent with existing integration tests.

Sequence Diagram

sequenceDiagram
  participant W as ParallelMcapWriter
  participant MCAP as MCAP file
  participant F as CodecFactory
  participant SM as SchemaMetadata
  participant P as Schema parser
  participant C as CdrCodec/CdrDecoder

  W->>MCAP: "add_schema(name=message_type, encoding=schema_encoding, data=schema)"
  W->>MCAP: "add_channel(topic, message_encoding=encoding, schema_id)"

  F->>MCAP: "read ChannelInfo (encoding, schema_encoding, schema)"
  F->>F: "detect_encoding(channel.encoding, channel.schema_encoding)"
  F->>SM: "SchemaMetadata::cdr_with_encoding(type, schema, schema_encoding)"
  SM->>P: "parse_schema_with_encoding(type, schema, schema_encoding)"
  P-->>SM: "MessageSchema"

  F->>C: "get_codec(Cdr)"
  C->>C: "decode_dynamic(data, SchemaMetadata)"
  alt schema_encoding indicates ROS1
    C->>C: "read_string() reads len bytes"
  else ROS2/CDR
    C->>C: "read_string() reads len-1 bytes and skips terminator"
  end
  C-->>F: "DecodedMessage"

Loading

@zhexuany zhexuany changed the title fix: ROS1 string decoding, /tf topic support, and schema variant handling fix: ROS1 string decoding and builtin type variant handling Feb 8, 2026
Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

@greptile-apps
Copy link

greptile-apps bot commented Feb 8, 2026

Additional Comments (2)

src/encoding/cdr/decoder.rs
Ignored terminator read
In the ROS2/CDR branch, the null terminator is skipped with let _ = cursor.read_u8();, which drops any read error. For truncated/malformed inputs where len claims a terminator byte but the buffer ends right after the string bytes, this will incorrectly return Ok and hide the truncation. This should propagate the error (and ideally validate the terminator is 0).

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/encoding/cdr/decoder.rs
Line: 618:626

Comment:
**Ignored terminator read**
In the ROS2/CDR branch, the null terminator is skipped with `let _ = cursor.read_u8();`, which drops any read error. For truncated/malformed inputs where `len` claims a terminator byte but the buffer ends right after the string bytes, this will incorrectly return `Ok` and hide the truncation. This should propagate the error (and ideally validate the terminator is `0`).

How can I resolve this? If you propose a fix, please make it concise.

src/io/formats/mcap/writer.rs
Schema encoding misclassified
schema_encoding is inferred via encoding == "cdr" && (message_type.contains('/') || message_type.contains("msg")), which will label any slash-delimited ROS type (including ROS1 pkg/Type) as "ros2msg". This will cause downstream detect_encoding / schema parsing to treat ROS1 schemas as ROS2 (and vice versa), breaking decoding for ROS1-written MCAPs. The schema encoding needs to be sourced from the actual schema/version context (e.g., ros1msg vs ros2msg) rather than this heuristic.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/io/formats/mcap/writer.rs
Line: 1084:1101

Comment:
**Schema encoding misclassified**
`schema_encoding` is inferred via `encoding == "cdr" && (message_type.contains('/') || message_type.contains("msg"))`, which will label any slash-delimited ROS type (including ROS1 `pkg/Type`) as `"ros2msg"`. This will cause downstream `detect_encoding` / schema parsing to treat ROS1 schemas as ROS2 (and vice versa), breaking decoding for ROS1-written MCAPs. The schema encoding needs to be sourced from the actual schema/version context (e.g., `ros1msg` vs `ros2msg`) rather than this heuristic.

How can I resolve this? If you propose a fix, please make it concise.

@zhexuany zhexuany merged commit 8f95b00 into main Feb 9, 2026
15 checks passed
@zhexuany zhexuany deleted the fix/ros2-idl-array-alignment branch February 9, 2026 02:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant